Use cuda virtual memory management and merge blocks #36189

wanghuancoder · 2021-09-28T11:09:52Z

PR types

New features

PR changes

Others

Describe

使用NV Virtual Memory Management（VMM）机制新写Allocator（AutoGrowthV2）。
使用VMM后，可以对向CUDA申请的显存块进行合并（而老的AutoGrowth是不能够合并显存块的）。

本Allocator仍是AutoGrowth的，但比老的AutoGrowth相比，能够大幅减少碎片的产生：
- 使用speedyspeech模型实验。显存使用量从8710MiB降到3660MiB。
Allocator接口性能
- 申请释放显存，新的Allocator（AutoGrowthV2）与老的AutoGrowth的算法复杂度是一致的。
- 但新的Allocator（AutoGrowthV2）的FreeBlocks相对减少，查询速度会稍有提升。使用standalone_executor_test测试，最大FreeBlocks数量，从185降低到166。
使用VMM会对Kernel的计算性能造成影响吗？
- NV同学表示不会
- 使用Bert模型测试，未发现VMM造成明显性能下降。

… auto_growth_v2

zhiqiu · 2021-10-22T03:20:52Z

paddle/fluid/memory/allocation/allocator_facade.cc

+      auto result =
+          paddle::platform::dynload::cuDeviceGet(&device, p.GetDeviceId());
+      PADDLE_ENFORCE_EQ(
+          result, CUDA_SUCCESS,
+          platform::errors::Fatal("Call CUDA API cuDeviceGet faild, return %d.",
+                                  result));


PADDLE_ENFORCE_CUDA_SUCCESS ？

感谢，已修改。

zhiqiu · 2021-10-22T03:22:21Z

paddle/fluid/memory/allocation/allocator_facade.cc

+          result, CUDA_SUCCESS,
+          platform::errors::Fatal(
+              "Call CUDA API cuDeviceGetAttribute faild, return %d.", result));
+    } catch (...) {


in which case it may raise exception?

https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/fluid/memory/allocation/allocator_facade.cc#L277
类似于这里，所以异常都这么来处理。

I see, plz add comments on that.

zhiqiu · 2021-10-22T03:25:40Z

paddle/fluid/platform/gpu_info.h

@@ -131,6 +131,20 @@ gpuError_t RecordedCudaMalloc(void **ptr, size_t size, int dev_id);
 //! CudaFree with recorded info
 void RecordedCudaFree(void *p, size_t size, int dev_id);

+#ifdef PADDLE_WITH_CUDA


It seems not need

我不确定在不是CUDA的情况下CUDA_VERSION是否会被初始化为乱码，因此严谨了一些。

There is #ifdef PADDLE_WITH_CUDA at the beginning, so I think maybe it is duplicated.

zhiqiu · 2021-10-22T03:28:38Z

paddle/fluid/platform/dynload/cuda_driver.h

+  __macro(cuInit);                                      \
+  __macro(cuDriverGetVersion);                          \
+  __macro(cuGetErrorString);                            \
+  __macro(cuModuleLoadData);                            \
+  __macro(cuModuleGetFunction);                         \
+  __macro(cuModuleUnload);                              \
+  __macro(cuOccupancyMaxActiveBlocksPerMultiprocessor); \
+  __macro(cuLaunchKernel);                              \
+  __macro(cuCtxCreate);                                 \
+  __macro(cuCtxGetCurrent);                             \
+  __macro(cuDeviceGetCount);                            \
+  __macro(cuDevicePrimaryCtxGetState);                  \


These are duplicated with APIs in #else, maybe #else is not needed

这里面有些API，只有cuda10.2以上才有，所以用宏控制了一下。

I think you can always define the APIs that exists in version < 10.2 and > 10.2 without macro.

zhiqiu · 2021-11-03T03:44:16Z

paddle/fluid/platform/gpu_info.cc

@@ -641,6 +646,30 @@ class RecordedCudaMallocHelper {

  uint64_t LimitSize() const { return limit_size_; }

+#ifdef PADDLE_WITH_CUDA
+#if CUDA_VERSION >= 10020
+  CUresult cuMemCreate(CUmemGenericAllocationHandle *handle, size_t size,


Better name the member function like other functions, for example, "CreateMem". (Start with upper case)

done，thx！

zhiqiu · 2021-11-03T03:54:04Z

paddle/fluid/memory/allocation/cuda_virtual_mem_allocator.cc

+  paddle::platform::CUDADeviceGuard guard(place.device);
+  PADDLE_ENFORCE_CUDA_SUCCESS(cudaMemGetInfo(&actual_avail, &actual_total));
+
+  virtual_mem_size_ = (actual_total + granularity_ - 1) & ~(granularity_ - 1);


wahy do this?

这是为了申请一个虚拟地址空间，大小恰好等于GPU的显存大小。确保一次性开辟的虚拟地址空间够用。

virtual_mem_size_ = (actual_total + granularity_ - 1) & ~(granularity_ - 1);
已经改成使用函数调用方式

zhiqiu · 2021-11-03T03:55:47Z

paddle/fluid/memory/allocation/virtual_memory_auto_growth_best_fit_allocator.cc

+    block--;
+    auto pre = block;
+    block++;
+    block++;
+    auto next = block;
+    block--;


cannot easily understand it...

有pre、next还不好理解吗？

已改成使用std::next和std::prev

… auto_growth_v2

zhiqiu

LGTM

wanghuancoder and others added 30 commits September 28, 2021 11:08

Use cuda virtual memory management and merge blocks, test=develop

12bba85

refine, test=develop

4ca9d2f

refine, test=develop

1fa7328

refine, test=develop

3544756

refine, test=develop

ae06a0b

refine, test=develop

97234d4

refine, test=develop

41a4b97

refine, test=develop

a21c982

window dll, test=develop

d38050b

merge, test=develop

5b83128

fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

1296daa

use autogrowthv2 for system allocator, test=develop

cc368b5

remove ~CUDAVirtualMemAllocator(), test=develop

c3889e7

refine, test=develop

4d8cfc1

fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

7863dfb

fix cuda error of CUDA_ERROR_NOT_INITIALIZED, test=develop

34983a8

fix bug, test=develop

c53a782

revert system allocator, test =develop

8208d51

revert multiprocessing, test=develop

52d021f

fix AutoGrowthBestFitAllocatorV2 mutxt, test=develop

c01bf0a

catch cudaErrorInitializationError when create allocator, test=develop

a257984

Merge branch 'develop' into auto_growth_v2

629a80c

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

13d4285

… auto_growth_v2

fix cuMemSetAccess use, test=develop

ec85a0f

refine cuda api use, test=develop

329c568

refine, test=develop

3ba1985

for test, test=develop

f253ffb

for test, test=develop

8312f3c

switch to v2, test=develop

7f1891e

refine virtual allocator, test=develop

ce93e11

wanghuancoder added 5 commits October 19, 2021 07:54

Record cuMemCreate and cuMemRelease, test=develop

6ab7de3

refine, test=develop

1d90246

avoid out of bounds, test=develop

b9c04cc

rename allocator, test=develop

5fca3b0

refine, test=develop

f164521

zhiqiu closed this Oct 22, 2021

zhiqiu reopened this Oct 22, 2021

zhiqiu reviewed Oct 22, 2021

View reviewed changes

wanghuancoder added 5 commits October 22, 2021 07:15

use PADDLE_ENFORCE_CUDA_SUCCESS, test=develop

55ef100

for test,test=develop

fa077f3

refine, test=develop

b82a059

refine, test=develop

a4db9cb

refine, test=develop

7eadf41

zhiqiu reviewed Nov 3, 2021

View reviewed changes

wanghuancoder added 4 commits November 3, 2021 06:12

refine, test=develop

4b20091

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…

84211e5

… auto_growth_v2

refine, test=develop

e8469f4

refine, test=develop

f7df2b8

zhiqiu approved these changes Nov 8, 2021

View reviewed changes

phlrain self-requested a review November 8, 2021 11:07

phlrain approved these changes Nov 8, 2021

View reviewed changes

wanghuancoder merged commit a1ec1d5 into PaddlePaddle:develop Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use cuda virtual memory management and merge blocks #36189

Use cuda virtual memory management and merge blocks #36189

wanghuancoder commented Sep 28, 2021 •

edited

Loading

zhiqiu Oct 22, 2021

wanghuancoder Oct 22, 2021

zhiqiu Oct 22, 2021

wanghuancoder Oct 22, 2021 •

edited

Loading

zhiqiu Oct 22, 2021

zhiqiu Oct 22, 2021

wanghuancoder Oct 22, 2021

zhiqiu Oct 22, 2021

zhiqiu Oct 22, 2021

wanghuancoder Oct 22, 2021

zhiqiu Oct 22, 2021

zhiqiu Nov 3, 2021

wanghuancoder Nov 3, 2021

zhiqiu Nov 3, 2021

wanghuancoder Nov 3, 2021

wanghuancoder Nov 5, 2021 •

edited

Loading

zhiqiu Nov 3, 2021

wanghuancoder Nov 3, 2021

wanghuancoder Nov 5, 2021

zhiqiu left a comment

Use cuda virtual memory management and merge blocks #36189

Use cuda virtual memory management and merge blocks #36189

Conversation

wanghuancoder commented Sep 28, 2021 • edited Loading

PR types

PR changes

Describe

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanghuancoder Oct 22, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wanghuancoder Nov 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhiqiu left a comment

Choose a reason for hiding this comment

wanghuancoder commented Sep 28, 2021 •

edited

Loading

wanghuancoder Oct 22, 2021 •

edited

Loading

wanghuancoder Nov 5, 2021 •

edited

Loading